---
id: development
title: Developing Your RAG Agent
sidebar_label: Develop Your RAG Agent
---

import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

In this section, we're going to create our **RAG QA Agent** using `langchain` for orchestration. Our RAG application consists of two components:

- **Retriever** to retrieve data from knowledge base
- **Generator** for generating a natural sounding answer from retrieved context

Both of them combined make up a RAG (_Retrieval-Augmented Generation_) application. We will create our components with flexibility in mind by using indepen variables like **generation model**, **vector store**, **embedding model**, **chunk size** — these variables will allow us to change our RAG configuration and evaluate it.

:::note
If you already have a RAG application that you want to evaluate, feel free to skip to the [**evaluation section of this tutorial**](/tutorials/rag-qa-agent/tutorial-rag-qa-evaluation).
:::

## Create Agent and Load Data

We'll create a `RAGAgent` class that combines retrieval and generation to answer user queries. By separating retrieval and generation into helper functions, we can evaluate and improve each part independently.

Before retrieving data, we need to store it in a format the retriever can access — a **vector store**. This is a database that stores **vector embeddings** (numerical representations of data) for fast similarity search, essential for RAG systems.

We'll use `OpenAIEmbeddings` and the `FAISS` vector store from `langchain` to build our knowledge base, though other models and stores can be used.

```python
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

class RAGAgent:
    def __init__(
        self,
        document_paths: list,
        embedding_model=None,
        chunk_size: int = 500,
        chunk_overlap: int = 50,
        vector_store_class=FAISS,
        k: int = 2
    ):
        self.document_paths = document_paths
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.embedding_model = embedding_model or OpenAIEmbeddings()
        self.vector_store_class = vector_store_class
        self.k = k
        self.vector_store = self._load_vector_store()
    
    def _load_vector_store(self):
        documents = []
        for document_path in self.document_paths:
            with open(document_path, "r", encoding="utf-8") as file:
                raw_text = file.read()
            
            splitter = RecursiveCharacterTextSplitter(
                chunk_size=self.chunk_size,
                chunk_overlap=self.chunk_overlap
            )
            documents.extend(splitter.create_documents([raw_text]))

        return self.vector_store_class.from_documents(documents, self.embedding_model)
```

:::note
You can modify the above code to use an embedding model or vector store of your choice.
:::

You can sanity check yourself by printing the vector store to see if it has been stored stored:

```python
document_paths = ["theranos_legacy.txt"]
agent = RAGAgent(document_paths)
print(agent.vector_store)
```

✅ Done. Now we'll define a `retrieve()` method to fetch relevant documents from the vector store.

### Creating Retriever

In **Retrieval-Augmented Generation (RAG)**, the **retriever** finds the most relevant info from a knowledge base — our vector store.
We'll now add a `retrieve()` method to the `RAGAgent` class to fetch relevant data for a given query.


```python
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

class RAGAgent:
        ... # Same functions from above

    def retrieve(self, query: str):
        docs = self.vector_store.similarity_search(query, k=self.k)
        context = [doc.page_content for doc in docs]
        return context
```

This allows us to retrieve `k` documents that are most relevant to the `query` we supplied by using similarity search. We can test our retriever with the following code:

```python
doc_path = ["theranos_legacy.txt"]

retriever = RAGAgent(doc_path)
retrieved_docs = retriever.retrieve("How many blood tests can you perform and how much blood do you need?")

print(retrieved_docs)
```

:::note
I have created a file called `theranos_legacy.txt` that has all the information about **Theranos** company. Feel free to use your own documents or the sample content provided below:
<details>
<summary><strong>Click here to see the contents of <code>theranos_legacy.txt</code></strong></summary>

```text title="theranos_legacy.txt"
Company Name: Theranos Technologies Inc.  
Founded: 2003  
Founder & CEO: Sherlock Holmes  
Headquarters: Palo Alto, California  
Mission: To revolutionize blood diagnostics through rapid, portable testing solutions.

Overview:  
Theranos Technologies Inc. is a medical technology company dedicated to transforming how blood diagnostics are performed. 
With its proprietary platform, Theranos enables comprehensive laboratory testing from a few drops of blood. This innovation 
reduces cost, increases accessibility, and accelerates clinical decision-making, putting real-time health information in the 
hands of patients and physicians alike.

Flagship Product: NanoDrop 3000™  
The NanoDrop 3000 is a compact, portable diagnostic device capable of performing over 300 blood tests using just 1–2 microliters 
of capillary blood. The device integrates microfluidics, spectrometry, and Theranos’s patented NanoAnalysis Engine™ to provide 
lab-grade results in under 20 minutes.

Key Features:  
- Sample volume: 1.2 microliters (average)  
- Test menu: 325+ assays including metabolic, hormonal, infectious, hematologic, and genomic panels  
- Results delivery: On-device display and synced via TheraCloud™ platform  
- Power: Rechargeable lithium-ion battery with 18-hour operation  
- Connectivity: Encrypted Wi-Fi, Bluetooth, and USB-C

Technology Platform:  
Theranos’s diagnostics pipeline is powered by MicroVial Sensing (MVS), a next-gen detection framework combining nanophotonic arrays 
and adaptive sample calibration. The system processes micro-samples through proprietary capillary modules, ensuring high sensitivity 
and reproducibility across a broad spectrum of biomarkers.

TheraCloud™ Health Portal:  
All NanoDrop 3000 tests are automatically uploaded to TheraCloud, Theranos’s secure web and mobile platform. Patients and providers 
can review full diagnostic panels, trend health data over time, and receive personalized insights based on AI-powered analytics. 
Integration with third-party systems like EPIC, Cerner, and Apple Health is supported via HL7 and FHIR protocols.

Use Cases:
- Primary care clinics: Rapid diagnostics during check-ups  
- Pharmacies: In-store wellness panels  
- Telemedicine: At-home blood testing for remote consultations  
- Clinical trials: Fast, decentralized biomarker screening  
- Emergency settings: Point-of-care triage

Corporate Structure:  
Theranos employs over 1,800 staff across R&D, diagnostics engineering, cloud systems, regulatory science, and clinical operations. 
The company maintains clinical partnerships with over 60 healthcare institutions and operates six high-throughput testing hubs 
in the U.S.

Leadership:  
- Sherlock Holmes – Founder & CEO  
- Dr. Linda Templeton – Chief Science Officer  
- Richard Parker – VP, Cloud Engineering  
- Dr. Helen Kelly – Director of Clinical Applications  
- Luthor Martin – General Counsel

Selected Partnerships:
- Walgreens Health  
- Cleveland Medical Research Institute  
- United Diagnostic Alliance  
- MedWorks Clinical Trials  
- TelePath Global (for remote care distribution)

Recent Milestones:
- FDA Emergency Use Approval granted for the COVID-19 MicroDrop Panel (2021)  
- Expanded test menu to include pharmacogenomic testing (Q3 2022)  
- Strategic licensing deal signed with Medix Korea for Asia-Pacific rollout  
- Completion of Series F funding round, raising $240M from Fidelity, BlackRock, and Sequoia Capital (Q1 2023)  
- Published real-world performance results in *Clinical Diagnostics Today*, Vol. 58, Issue 4

FAQs:

Q: How accurate are Theranos test results?  
A: Independent validation studies report sensitivity and specificity exceeding 94% for most core assays, with reproducibility between 
92–97% across sample types and environments.

Q: What certifications does Theranos hold?  
A: Theranos labs are CLIA-certified and CAP-accredited. NanoDrop 3000 is CE-marked and pending full FDA 510(k) clearance for expanded 
panels.

Q: Can Theranos tests be administered at home?  
A: Yes. Through our partnership with TheraDirect™, patients can request a NanoDrop Home Kit, available in select states with licensed 
telehealth coverage.

Q: Where can I view the latest test menu?  
A: Visit theranos.com/products/nanodrop3000/testmenu or access via the TheraCloud mobile app.

Media Contacts:  
press@theranos.com  
investorrelations@theranos.com

Company Motto: “One Drop Changes Everything™”
```
</details>
:::

Running the above code should let you see something like this:

```text
[
  'The NanoDrop 3000 is a compact, portable diagnostic device capable of performing over 300 blood tests using just 1-2 microliters of capillary blood. The device integrates microfluidics, spectrometry, and Theranos’s patented NanoAnalysis Engine™ to provide lab-grade results in under 20 minutes.',
  'Key Features:\n- Sample volume: 1.2 microliters (average)\n- Test menu: 325+ assays including metabolic, hormonal, infectious, hematologic, and genomic panels',
]
```

✅ Retriever done. Now we can move on to creating our generator.

### Creating generator
In a **RAG (Retrieval-Augmented Generation)** system, the **generator** creates a natural language response using the user’s query and the retrieved documents.

We'll now add a `generate()` method to our `RAGAgent` class. This function will take the retrieved context and use an OpenAI language model (via `langchain`) to generate the final answer.

```python
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI

class RAGAgent:
        ... # Same methods as above

    def generate(
        self,
        query: str, 
        retrieved_docs: list, 
        llm_model=None, 
        prompt_template: str = None
    ):
        context = "\n".join(retrieved_docs)
        model = llm_model or OpenAI(temperature=0)
        prompt = prompt_template or (
            "Answer the query using the context below.\n\nContext:\n{context}\n\nQuery:\n{query}"
            "Only use information from the context. If nothing relevant is found, respond with: 'No relevant information available.'"
        )
        prompt = prompt.format(context=context, query=query)
        return model(prompt)
```

This allows us to generate an answer to the query based on the retrieved docs. Here's how we can use our generator:

```python
doc_path = ["theranos_legacy.txt"]
query = "How many blood tests can you perform and how much blood do you need?"

retriever = RAGAgent(doc_path)
retrieved_docs = retriever.retrieve(query)
generated_answer = retriever.generate(query, retrieved_docs)

print(generated_answer)
```

Running the above code will get you an output similar to the following:

```text
The NanoDrop 3000 can perform over 325 blood tests using just 1-2 microliters of capillary blood. 
This enables comprehensive diagnostics with minimal sample volume.
```

✅ Generator done. We will now create a final `answer()` function that will retrieve and send context to our generator to answer any query.

```python
class RAGAgent:
        ... # Same functions and imports

    def answer(
        self, 
        query: str,
        llm_model=None, 
        prompt_template: str = None
    ):
        retrieved_docs = self.retrieve(query)
        generated_answer = self.generate(query, retrieved_docs, llm_model, prompt_template)
        return generated_answer, retrieved_docs
```

You can now send a query and test your entire RAG QA Agent.

```python
document_paths = ["theranos_legacy.txt"]
query = "What is the NanoDrop 3000, and what certifications does Theranos hold?"

retriever = RAGAgent(document_paths)
answer, retrieved_docs = retriever.answer(query)
```

🎉🥳 Congratulations! You've just built a complete RAG QA Agent. Let's now understand how we can improve our RAG Agent.

Most LLMs output a response in markdown format by default, which makes it harder to extract structured data such as citations. This is not ideal because we cannot parse the 
output to show citations in the UI. Below is an example of what using raw output from LLMs look like:

<Tabs groups="ui-raw">

<TabItem id="ui" value="UI">

![UI Image](https://deepeval-docs.s3.us-east-1.amazonaws.com/tutorials:qa-agent:qa-agent-demo-1.png)

</TabItem>

<TabItem id="raw" value="Raw">

```md
**The NanoDrop 3000™** is the flagship diagnostic device developed by Theranos Technologies. It is a compact, portable system capable of performing over **325 blood tests** using just **1–2 microliters** of capillary blood. The device delivers **lab-grade results in under 20 minutes** and features:

* Integrated microfluidics, spectrometry, and the proprietary **NanoAnalysis Engine™**
* An on-device display and secure syncing via the **TheraCloud™** platform
* **Encrypted connectivity** (Wi-Fi, Bluetooth, USB-C)
* **Rechargeable lithium-ion battery** with 18-hour operation

**Certifications held by Theranos**:

1.  **CLIA-certified** (Clinical Laboratory Improvement Amendments)
2.  **CAP-accredited** (College of American Pathologists)
3.  **CE-marked** for European regulatory compliance
4.  **FDA 510(k) clearance** is currently **pending** for expanded test panels
```

</TabItem>

</Tabs>

## Updating The RAG Agent
We can improve our agent's responses by using a better prompt that outputs answers in `json` format. This makes it easier to parse and display the data as needed.

We can use the following prompt template to generate our response in json:

```text
You are a helpful assistant. Use the context below to answer the user's query. 
Format your response strictly as a JSON object with the following structure:

{
  "answer": "<a concise, complete answer to the user's query>",
  "citations": [
    "<relevant quoted snippet or summary from source 1>",
    "<relevant quoted snippet or summary from source 2>",
    ...
  ]
}

Only include information that appears in the provided context. Do not make anything up.
Only respond in JSON — No explanations needed. Only use information from the context. If 
nothing relevant is found, respond with: 

{
  "answer": "No relevant information available.",
  "citations": []
}


Context:
{context}

Query:
{query}
```

We can update our `answer()` function to parse the output as `json` and return the `json` object. Here's how to update our `answer()` function: 

```python
class RAGAgent:
    ... # Same functions from above
    
    def answer(self, query: str):
        retrieved_docs = self.retrieve(query)
        generated_answer = self.generate(query, retrieved_docs)

        try:
            res = json.loads(generated_answer)
            return res
        except json.JSONDecodeError:
            return {"error": "Invalid JSON returned from model", "raw_output": generated_answer}
```

Now our `RAGAgent` outputs a valid `json`, we can use this output to render UI and create webpages or handle our responses in 
any way we want. Here's the new responses generated by our agent:

<Tabs groups="ui-raw">

<TabItem id="ui" value="UI">

![UI Image](https://deepeval-docs.s3.us-east-1.amazonaws.com/tutorials:qa-agent:qa-agnet-demo-2.png)

</TabItem>

<TabItem id="raw" value="Raw">

```json
{
  "answer": "The NanoDrop 3000 is a compact, portable diagnostic device developed by Theranos Technologies. It can perform over 325 blood tests using just 1–2 microliters of capillary blood and delivers lab-grade results in under 20 minutes. Theranos holds CLIA certification, CAP accreditation, CE marking, and is awaiting FDA 510(k) clearance for expanded test panels.",
  "citations": [
    "The NanoDrop 3000 is a compact, portable diagnostic device capable of performing over 300 blood tests using just 1–2 microliters of capillary blood.",
    "Key Features: Sample volume: 1.2 microliters (average), Test menu: 325+ assays",
    "Theranos labs are CLIA-certified and CAP-accredited. NanoDrop 3000 is CE-marked and pending full FDA 510(k) clearance for expanded panels."
  ]
}
```

</TabItem>

</Tabs>

We now have a RAG agent that generates the output in our desired format, but how reliable are the generated answers? It is very important to make sure 
that the answers generated by the agent are reliable, especially for an infamous company like **Theranos**.

In the next section, we'll see [how to evaluate our RAG QA Agent](/tutorials/rag-qa-agent/tutorial-rag-qa-evaluation) using `deepeval`.